Breeding system and geospatial variation shape the population genetics of Triodanis perfoliata

Abstract Both intrinsic and extrinsic forces work together to shape connectivity and genetic variation in populations across the landscape. Here we explored how geography, breeding system traits, and environmental factors influence the population genetic patterns of Triodanis perfoliata, a widespread mix‐mating annual plant in the contiguous US. By integrating population genomic data with spatial analyses and modeling the relationship between a breeding system and genetic diversity, we illustrate the complex ways in which these forces shape genetic variation. Specifically, we used 4705 single nucleotide polymorphisms to assess genetic diversity, structure, and evolutionary history among 18 populations. Populations with more obligately selfing flowers harbored less genetic diversity (π: R 2 = .63, p = .01, n = 9 populations), and we found significant population structuring (F ST = 0.48). Both geographic isolation and environmental factors played significant roles in predicting the observed genetic diversity: we found that corridors of suitable environments appear to facilitate gene flow between populations, and that environmental resistance is correlated with increased genetic distance between populations. Last, we integrated our genetic results with species distribution modeling to assess likely patterns of connectivity among our study populations. Our landscape and evolutionary genetic results suggest that T. perfoliata experienced a complex demographic and evolutionary history, particularly in the center of its distribution. As such, there is no singular mechanism driving this species' evolution. Together, our analyses support the hypothesis that the breeding system, geography, and environmental variables shape the patterns of diversity and connectivity of T. perfoliata in the US.


| INTRODUC TI ON
Understanding what factors drive patterns of genetic diversity among populations is central to evolutionary ecology, and critical for predicting how species respond to changing environments (Manel & Holderegger, 2013). Evolution is intrinsically linked to genetic diversity, which often serves as the raw material for evolutionary processes (e.g., Alsos et al., 2012;Jump et al., 2009;Stange et al., 2021).
However, data on quantitative traits are rarely available for wild species across their distributions, limiting our ability to study how traits evolve. Measures of genetic diversity from neutral markers are more readily available and correlations have been found between the differentiation of quantitative traits and neutral markers (e.g., Frankham et al., 1999;Jump et al., 2009;Merilä & Crnokrak, 2001).
The adaptive potential may be limited in naturally occurring populations with low genetic diversity and small effective population sizes (Hobbs & Humphries, 1995;Jump et al., 2009;Lai et al., 2019;Lande & Shannon, 1996).
Studies that consider both intrinsic and extrinsic factors over multiple scales provide a more complete interpretation of what drives patterns of genetic diversity at the intraspecific level (Schregel et al., 2018;Twyford et al., 2020).
Because flowering plants often exhibit high intraspecific variation in reproductive systems, they present novel opportunities to examine the role of breeding systems in influencing patterns of genetic diversity and divergence among populations (e.g., Culley & Stokes, 2012;Sun et al., 2002;Toczydlowski & Waller, 2019). For example, many flowering plants have the capacity for both crossand self-fertilization, a condition termed mixed-mating (Goodwillie et al., 2005;Lande & Schemske, 1985). Self-fertilization presents several benefits in the context of mate availability and range expansion (Baker, 1955;Busch & Delph, 2012). Individuals within populations with relatively high inbreeding exhibit high genetic similarity, reducing effective population size. In turn, these populations are highly susceptible to genetic drift and subsequent loss of genetic diversity and potentially greater susceptibility to changes in the external environment (Lande, 1988;Lande & Shannon, 1996;Wright, 1946). Inbreeding, particularly in smaller populations, also drives differentiation among populations, in parallel with patterns expected for geographically isolated populations (Lowe et al., 2005;Toczydlowski & Waller, 2019;Wright, 1965). Despite the potential importance of plant reproduction in driving patterns of genetic diversity and connectivity, this feature of species is still poorly understood in the context of population genetics at large spatial scales in wild populations.
Among populations, a common explanation for spatial patterns of genetic diversity is isolation-by-distance (IBD), where populations that are geographically isolated exhibit greater genetic differentiation via attenuated gene flow and genetic drift (Wright, 1943, Slatkin, 1993, e.g., Toczydlowski & Waller, 2019, Hellwig et al., 2021. While greater inbreeding may drive genetic isolation at a local scale, IBD often has a greater influence at broad spatial scales. Larger populations are typically less susceptible to drift, but founder effects and population bottlenecks can still drive genetic differentiation in isolated populations, especially following colonization events (Toczydlowski & Waller, 2019;Wright, 1977). In addition to IBD, many studies invoke key roles for variation in topography and climate in mediating spatial distribution patterns, which either restrict dispersal (e.g., mountains, rivers, etc.), or act as suitable corridors for gene flow (isolation by the environment; McRae, 2006). Species also exhibit specific ecological tolerances that dictate spatial patterns of gene flow and migration (Chan et al., 2011;Sexton et al., 2014;Wang & Summers, 2010). Across heterogeneous landscapes, areas of ecological tolerance for a species may be more limited, resulting in increased genetic divergence among populations by reducing dispersal corridors (Wang & Bradburd, 2014). Incorporating estimates of environmental tolerance with IBD provides a more realistic framework for understanding population connectivity across landscapes (e.g., Alvarado-Serrano & Hickerson, 2018;Cruzan & Hendrickson, 2020;Cushman et al., 2009;Hevroy et al., 2018;Toczydlowski & Waller, 2019;Wang & Bradburd, 2014).
Integrating genetic data with landscape and environmental parameters can better describe the range of factors driving or maintaining patterns of genetic diversity among populations (e.g. Alvarado-Serrano & Hickerson, 2018;Chan et al., 2011;Cruzan & Hendrickson, 2020). Here we explicitly examine how breeding system variation, geographic distance, and habitat suitability may be integrated to explain spatial patterns of genetic diversity in 18 populations of Triodanis perfoliata (Campanulaceae), a widespread, annual native to North and South America (Weakley, 2010). All individuals of this species exhibit dimorphic cleistogamy, consisting of both obligately self-fertilizing flowers and flowers that can either self-or cross-fertilize (Gara & Muenchow, 1990;Trent, 1942). Because of the high potential for inbreeding in populations of T. perfoliata, we also examine how breeding system may correlate to metrics of genetic diversity and influence overall patterns of genetic structure.
At broad geographic scales we predict high levels of population structure and relatively high population genetic divergence; at this scale we predict that both isolation by distance (IBD) and isolation by environment (IBE) will be the strongest factors structuring genetic diversity. Both geographic distance and variance in biotic and abiotic factors can limit gene flow or shape the potential for gene flow through particular corridors. Therefore, we explicitly discern the roles of geographic isolation (IBD) and environment (IBE) in shaping observed patterns. At local spatial scales, we predict that populations with a greater allocation to cleistogamy will exhibit reduced genetic diversity and high population genetic structuring (average F st ) due to increased inbreeding and that the breeding system will be a more important factor influencing population-level genetic patterns. Finally, we use habitat suitability models to predict routes of dispersal among contemporary populations. Incorporating genetic data into these analyses provides a framework for understanding corridors of gene flow among our study populations.
In concert with our other predictions, we expect the models to reflect limited gene flow among geographically or environmentally isolated population genetic clusters and phylogenetic clades; and that some genetic groups may appear genetically isolated despite the potential for gene flow through these corridors, due to increased selfing or other ecological factors. We analyze both population-and evolutionary genetic relationships to better understand the contemporary and historical connectivity among populations. Overall, we aim to outline a thorough framework of factors driving observed population genetic patterns at both broad and narrow scales.

| Study species & breeding system
Triodanis perfoliata (L.) Nieuwl. (Campanulaceae) is a small, common, annual herb native to North and South America. This weedy annual grows in a variety of conditions including disturbed areas, along rocky outcrops, dry open habitats, and prairies (Gleason & Cronquist, 1991;Weakley, 2010). Seeds of this species are quite small (approx. Length = 0.5 mm, width = 1.3 mm) and may be dispersed by ants (McVaugh, 1948;Shetler & Morin, 1986). Individuals exhibit a mixed mating system via dimorphic cleistogamy that includes two distinct floral types. Chasmogamous (CH) flowers are purple, five-petaled, ~1.5 cm in diameter, and can either outcross or self-fertilize; cleistogamous (CL) flowers completely lack a corolla and are obligately self-fertilizing (Gara & Muenchow, 1990;Goodwillie & Stewart, 2013;Trent, 1940). All individuals of T. perfoliata exhibit both floral types and there is considerable variation among populations in the relative production of CH to CL flowers . Some of this breeding system variation is driven by variation in pollination visitation and abiotic conditions .

| DNA collection, extraction, and sequencing
In late spring and early summer 2017, leaf tissues were collected in the field from 18 populations of T. perfoliata (total = 76 individuals; range = 1-6 individuals/population) spanning the contiguous US ( Figure 1a), and from 6 individuals of T. biflora from southeast Missouri (Midwestern US) to serve as an outgroup for phylogenetic analyses. We used a CTAB protocol (Doyle & Doyle, 1987) to extract high quality genomic DNA from silica dried leaf tissue. Subsequently, RADSeq (Restriction site Associated DNA Sequencing) was performed at Floragenex, Inc. to identify genetic variants (Eaton, 2014).
The restriction enzyme Sbf1 generated short fragments prior to the addition of sequencing adapters, and all samples were analyzed on the same flow cell with Illumina 1x91bp sequencing. After sequencing, quality control and sequence alignment were conducted using Bowtie (Langmead & Salzberg, 2012), BWA (Li, 2011) and Velvet (Zerbino, 2010) and variant calling were performed using Samtools (Li et al., 2009). The final dataset consists of variant calls with a F I G U R E 1 Sample localities and models of population connectivity. (a) Study site localities (n = 18); filled markers indicate sites for which breeding system traits were estimated (n = 9). (b) Population connectivity among all sites (c) population connectivity among genetic clusters (k = 4). (d) Population connectivity among major phylogenetic clades. Dark lines depict least-cost paths. Groups with no connections represent either clades or cluster groups that exist only at that locality. minimum sequencing depth of 15x, minimum Phred score of 20, and no more than 10% of missing genotypes.
A total of 9,716,774 raw reads were generated, of which 9,657,413 passed quality filters. These were used to build 5,646,126 provisional clusters, i.e., groups of sequencing read that likely cover the same position in multiple samples, each with a minimum cluster depth of 5x and maximum cluster depth of 1500x. After reading alignment and quality assessment, this yielded a final assembly that was approximately 5.2 Mb in length, consisting of 56,6649 contigs, each with a length of 92 bp. An average of 38.9% of the sequence reads from each sample aligned to a single position in this assembly. Variant calling yielded 4705 single nucleotide polymorphic (SNP) sites observed >90% of the sequenced individuals of T. perfoliata.

| Phylogenetic tree estimation
We used RAxML V8 (Stamatakis, 2014) to create a maximum likelihood phylogeny from over 6000 SNPs. Phylogenetic trees were generated using ASC_GTRGAMMA model of nucleotide evolution, which is an ascertainment bias general-time-reversible model (Lewis, 2001). Phylogeny support was estimated by using 10,000 rapid bootstrapped trees. Direct confirmation was conducted by splitting the data set into five subsets, each consisting of 1200 SNPs, and generating phylogenies using the same parameters as the complete data set to ensure the absence of major deviations in the resulting inferences. To test the hypothesis that populations with a greater allocation to CL flowers will exhibit greater overall population structuring, we performed a linear model between pCH and mean population pairwise FST and to test the hypothesis that populations exhibiting greater proportional production of CL flowers may maintain less genetic diversity, we performed linear regressions between metrics of genetic diversity (e.g., π, number of polymorphic sites) and pCH via a linear regression using the lm function in R statistics v 4.1.1 (R Core Team, 2021).

| Predicting dispersal networks
2.6.1 | Creating the SDM Likely routes of dispersal among populations or genetic groups were predicted via least-cost corridor analyses, an approach that incorporates species distribution models (SDMs; Chan et al., 2011).
SDMs were generated using occurrence records collected between the years 2000-2019 obtained from digital herbarium vouchers, primary literature, our lab fieldwork, and open-source occurrence data (n = 4503 initial records; GBIF.org 2020; addn. Berg et al., 2019). Data were first vetted for taxonomic assignment as well as apparent labeling errors (e.g., data points in oceans). Spatial clusters of localities can cause models to over-fit toward environmental biases and inflate model performance values (Boria et al., 2014;Hijmans, 2012;Veloz, 2009). Spatial biases were addressed by randomly selecting points clustered within a 10-km radius using SDMtoolbox 2.4 (Brown, 2014). The final vetted dataset consists of 1735 occurrence records. Nineteen bioclimatic layers at a 30 arc-minute resolution from WorldClim v2.0 (Hijmans et al., 2005) were used to generate species distribution models (SDM) in MaxEnt 3.3.3k (Phillips et al., 2020). SDMs were parameterized with SDMtoolbox v2.4 (Brown, 2014), to evaluate the performance of various combinations of five feature classes (linear; linear and quadratic; hinge; linear, quadratic and hinge; and linear, quadratic, hinge, product and threshold), and five regularization multipliers (0.5, 1, 2, 3, 4; Radosavljevic & Anderson, 2014). SDM performance built under each combination of parameters was assessed through a geographically structured k-fold cross-validation (i.e., the occurrence records were partitioned into k equal geographically clustered subsamples, here k = 3, and the models were trained with two of the groups and then evaluated with the excluded group until all group combinations were run). Model fit was assessed via the omission rate, area under the curve (AUC), and model feature class complexity (Brown, 2014). After optimum model parameters were determined (those leading to the lowest omission rate, highest AUC, and lowest complexity, in the order listed), a final SDM was built with all occurrence sites and projected into the current climate across the contiguous US, southern Canada, as well as northern Mexico.

Data and citations available in
The final SDM estimates contemporary habitat suitability and was used to estimate potential dispersal networks among populations and genetic groups of T. perfoliata in our genetic analyses.
These Least-Cost Corridors (LCCs) are estimated by inverting the SDM (one minus SDM suitability values) to function as a friction layer, characterizing the cost of dispersal through each pixel in the landscape; areas of high suitability have a lower dispersal cost compared to areas of low suitability (Chan et al., 2011). We examined multiple separate scenarios to understand how connectivity among these populations may influence patterns of genetic diversity.

| Population connectivity (SDM only)
In the first scenario, dispersal networks were estimated between all 18 populations included in our genetic analyses. This model serves as a null hypothesis by solely considering how habitat suitability predicts population connectivity in the absence of genetic data. Connectivity among genetic clusters. In the second scenario, dispersal networks were estimated among the genetic groups described in analyses of genetic structure (most likely K value). This scenario describes the role of likely dispersal corridors in shaping the genetic structure seen across the sampled landscape.
Connectivity among clades. In the third scenario, dispersal networks were estimated among major phylogenetic clades. Here we subjectively split the phylogeny into subclades by placing a vertical line near the base of the tree (see dashed line in Figure 3), which split the phylogeny into 8 evolutionary groups that each share common ancestry with clade members. These sub-clade groupings were chosen, in part, because they matched our structure groups (though sub-divided due to discordance in our results) and each sub-clade was assigned entirely to the same cluster group (with the exception of the clade containing the NY cluster). Decreasing the sub-clade group number (by moving the vertical line toward to most-recent common ancestors) would have resulted in more clades containing mixed cluster groups, whereas increasing groupnumber would have removed deeper evolutionary relationships into dispersal corridors calculation.

| Examination of IBD and IBE
To quantitatively test the relationships between the observed genetic divergence and both IBD and IBE, we used Multiple Matrix Regression with Randomization (MMRR) analyses in R (see Wang et al., 2013 for scripts to perform analysis). For these analyses, we first generated the following four distance matrices: genetic distance, geographic distance, environmental least-cost path distance, and environmental least-cost path total resistance. Genetic distance was quantified by measuring the inter-population FST among the 18 research sites in Arlequin v3.5 (Excoffier & Lischer, 2010). using SDMtoolbox v2.4 (Brown et al., 2017). The two raw IBE matrices, distance and resistance values, were highly correlated with the IBD matrix (R 2 = .988 and R 2 = .914, respectively). To remove the explicit effects of geographic distance from our two IBE matrices, we performed a linear regression in which each IBE distance matrix was a response variable and our geographic distance matrix was the predictor variable (Davies et al., 2007;Fritz & Rahbek, 2012;Vale et al., 2018). We used the resulting residuals output from each linear regression, one where environmental least-cost path (LCP) distance was input and a second where the environmental LCP total resistance was input (correlation to IBD matrix: R 2 = .156 and R 2 = .406, respectively), as our corresponding IBE distance matrices in separate MMRR analyses. In each MMRR analysis, genetic distance was used as a response variable, whereas the IBD and IBE matrices were each considered individually as a single predictor variable. To assess the statistical significance of each MMRR analysis and predictor variables, we used 10,000 permutations in each analysis (Wang et al., 2013).

| Genetic diversity and population structure
RADSeq yielded approximately 3.8 million bases and n = 631,465 total genotypes. We found n = 4705 single nucleotide polymorphisms (SNPs) in aligned sequence sites for T. perfoliata. These SNPs were presumably random except for their proximity to the Sbf1 restriction site that was used to generate the library. There is no reason to believe there is systematic selection occurring on the eight bases that constitute the Sbf1 target sequence (CCTGCAGG). The final, stringent list of variants contained data for 97% of genotypes; 3% of genotypes were not sequenced or found to have low quality, and therefore were considered missing. Total heterozygosity was relatively low (16%) and a potential signature of high inbreeding in this species. The number of private alleles, π, and polymorphic sites varied widely among the 18 populations, indicating considerable variation in genetic diversity (Table 1) and some populations from the Eastern US also clustered together (e.g., SC, NC2, NJ Figure 2b). We also accessed genetic clustering using the package sMNF (Frichot et al., 2014). sMNF is robust to departures from population genetic assumptions; results were overall very similar to our results from STRUCTURE ( Figure S1).

| Phylogenetic tree estimation
Phylogenetic analyses revealed three major geographic clades of T. perfoliata within the US (Figure 3). A couple of interesting phylogenetic patterns concur with specific results from genetic clusters from STRUCTURE: populations from KS and CO share a clade with populations from CA and WA, despite the significant geographic distance between these sites. In addition, individuals from a population in VA appear in both the clade consisting primarily of Eastern state populations and the clade of Midwestern and Gulf state populations.
But overall, our phylogeny shows discordance with results from STRUCTURE (see colored boxes in Figure 3 correspond to colors associated with K = 4 cluster Figure 2a). Only two genetic clusters, NY and the KS, CO, CA, WA cluster form monophyletic groups.
We acknowledge that some bootstrap support values are relatively low, particularly among individuals from the Midwest, whereas the coastal populations generally have higher support. In general, cluster groups with mixed group assignment (K = 17 Figure 2b) also had much lower support in our phylogenetic trees (e.g., compare the mixed cluster groups KS2 and NC2 with their corresponding phylogenetic support values). Note: By population: Number of private alleles, π, number of polymorphic sites generated in Arlequin. For nine populations, pCH indicates an estimate of the population breeding system (average proportion of CH flowers to CL flowers); addn. Breeding system information: Table S3. Longitude and latitude measured in decimal degrees.
Across these nine populations, mean population pairwise F ST was significantly negatively correlated with pCH; populations producing relatively more CL flowers exhibited greater overall population substructuring (p = .0015, multiple R 2 = .78, F = 25.19; Table S4).
In addition, the relative proportion of CH flower production (pCH) is significantly positively correlated to multiple metrics of genetic diversity (π: R 2 = .63, p = .01, Figure 4a; number of polymorphic sites: R 2 = .63, p = .01). These strong associations support the hypothesis that breeding system strongly influences population genetic patterns. Populations with a greater allocation to CL flowers exhibit reduced genetic diversity and increased population substructuring, as predicted by reduced gene flow among highly selfing populations.

| Predicting dispersal networks
Our final SDM had a high predictive performance with an Omission Rate (OR) < 0.01 and a moderate discriminatory ability with an AUC = 0.62 (Supp. Figure 2). The resulting distribution prediction matches our expert knowledge of the species' distribution.

| Incorporating genetic information in LCCs
To better visualize the likely dispersal corridors among groups, we incorporated genetic data into the LCC corridor calculations using two methods. Connectivity within population genetic clusters. The first method, the analysis of potential corridors among K = 4 genetic clusters, differs greatly from the null model (which compared all populations in LCC creation).
Using genetic clusters imposes key roles for factors such as demography and lineage history in shaping the genetic landscape in addition to IBD and IBE (Figure 1c). In this model, the

| Examination of IBD and IBE
Geography (or IBD) was the strongest predictor of our observed genetic diversity across the sampled distribution (R 2 = .189, p < .01; Figure 4b). However, environment with the signal of IBD removed, also significantly predicted the observed genetic diversity F I G U R E 3 RaxML maximum likelihood phylogenetic tree. Colors boxes indicate primary individual identity to genetic clusters corresponding to the plot K = 4, (Figure 2). Letters indicate the population (by US state, numbers for multiple populations in a single state); numbers indicate the individual identifier. Truncated branch length at base has a length of 0.0603. Small inset tree is a tree without basal nodes trimmed. Branch lengths and scale bar refer to the number of nucleotide substitutions per site. Inset image is modified from a handcolored lithograph by Endicott based on an illustration from John Torrey's a Flora of the state of New York (Torrey, 1843  (IBE distance, R 2 < .001, p = .945, Figure 4c; IBE total resistance R 2 = .058, p = .048, Figure 4d), however with a lower R 2 when compared to IBD.

| DISCUSS ION
Here we explicitly demonstrate how a variety of factors, both intrinsic (i.e., breeding system variation) and extrinsic (i.e., IBD, IBE), drive patterns of genetic diversity and population divergence in the mixed-mating annual, Triodanis perfoliata (Campanulaceae).
Specifically, we found that populations with a greater proportion of cleistogamous (obligately selfing) flowers had reduced genetic variation compared to populations with relatively more open flowers (π: R 2 = .63, p = .01). Greater inbreeding also tends to increase population structure and reduce gene flow among populations (Lande, 1988;Lande & Shannon, 1996;Wright, 1946). In concert with these predictions, we also found a strong negative association between mean population pairwise F ST  Therefore, we also explored models of population connectivity to better understand the factors shaping the genetic landscape of this widespread, mixed-mating species.
Reproductive systems can influence population dynamics in complex ways, driving patterns of genetic diversity, gene flow, and demographics, and influencing fitness (e.g., Charlesworth & Charlesworth, 1987;Lande & Shannon, 1996;Wright, 1946). Our findings are consistent with some studies that have also found significant population genetic structuring in dimorphic cleistogamous species (e.g., Lesica et al., 1988;Schoen, 1984;Sun et al., 2002;Toczydlowski & Waller, 2019). However, some cleistogamous species can also exhibit relatively high levels of genetic diversity and low population genetic structure (e.g., Cortés-Palomec et al., 2006;Culley & Wolfe, 2001). These patterns demonstrate that the impact of cleistogamy on population genetic patterns is context-dependent because some populations can still exhibit high levels of outcrossing in chasmogamous flowers (Culley & Wolfe, 2001). F I G U R E 4 (a) Correlation between breeding system (estimated as the average proportion of cleistogamous to chasmogamous flowers produced among individuals in a population: pCH) and genetic diversity (π: The number of nucleotide differences per site between two randomly chosen sequences from a population) in n = 9 populations. MMRR correlations between genetic distance and: (b) Euclidian distance (in decimal degrees), (c) environmental least-cost path distance, (d) environmental LCP total resistance. *For both (c) and (d), the matrices analyzed represented residuals of a lineage regression of the raw environment matrix and geographic distance. This removed the effect of geographic distance from these MMRR analyses. Another important feature of cleistogamous species, including T. perfoliata, is plasticity in reproductive allocation to cleistogamy due to abiotic and biotic factors Jones et al., 2013). In this study, we took advantage of population-level variation in the breeding system to address if greater average cleistogamy impacts population genetic diversity in T. perfoliata. We acknowledge that in this study we did not directly link the breeding system to quantitative estimates of inbreeding (F IS ). However, strong correlations between the extent of cleistogamy at the population level and metrics of genetic diversity suggest a meaningful relationship that warrants subsequent investigation. Inbreeding depression is an obvious potential negative consequence of greater inbreeding (Charlesworth & Charlesworth, 1987;Culley & Klooster, 2007), but our previous work demonstrated relatively low inbreeding depression for three populations of T. perfoliata under greenhouse conditions (Ansaldi et al., 2019). Nonetheless, here we demonstrate that populations with more average cleistogamy harbor less genetic diversity and reduced genetic diversity may result in long-term demographic consequences that are more difficult to quantify on short time scales (Hobbs & Humphries, 1995;Jump et al., 2009;Lai et al., 2019;Lande & Shannon, 1996). Variation in relative production of cleistogamy in T. perfoliata is influenced by a variety of factors including soil type, water availability and pollinator environment . Taken together, dimorphic cleistogamy presents a novel opportunity to understand the multifaceted nature in which reproductive systems and ecology influence population genetic patterns.
Increased inbreeding, geographic distance, and environmental resistance will all lead to population divergence and strong patterns of population sub-structuring. For T. perfoliata, we found evidence of high population sub-structuring and strong geographic signals describing regional connectivity among our study populations. analyses, a method that uses SDMs to find predicted corridors (Chan et al., 2011). As a null model, we first predicted dispersal corridors among all of our 18 study populations in the absence of additional genetic information (Figure 1b). This type of model upweights the relationships among distantly related individuals, which typically are more geographically isolated. Because of the indiscriminate nature of these types of analyses, a landscape can be hyper-connected, with all populations being connected via corridor pathways. Such a landscape ignores spatial-temporal dynamics associated with historical habitat and climate change (i.e., glacial cycles or climatic refugia). To build a more realistic model of landscape connectivity, we next incorporated genetic relationships into our landscape connectivity model. We modeled population connectivity among K = 4 genetic clusters, and this model was markedly different from the null model (Figure 1c). In the genetic cluster model, no dispersal corridors are predicted to connect with the NY population, even with geographically close populations. In other areas, we see broad patterns, predicting high connectivity in areas of high habitat suitability (Midwestern US) and reduced connectivity for geographically distanced localities, likely due to isolation by distance. This model also seems to account for barriers to dispersal, particularly the Appalachian and the Rocky Mountains, which either limit or reduce connectivity to the Midwestern US. We further explored how demographic and lineage history can influence genetic connectivity by integrating phylogeny into our analyses (Figures 1d and 3).
Due to methodological differences between our phylogenetic analyses and genetic clusters, the phylogenetic results aim to elucidate evolutionary relationships among populations and should clarify geographic regions that were historically connected at deeper timescales. In most cases, discordance between our structure and phylogenetic results (Figures 2 and 3) seem to reflect cases in which recent and frequent gene flow is likely between geographically close sites (i.e., admixture). Some of this discordance between our population genetic analyses and our phylogenetic analysis might be indicative of more complicated evolutionary histories, with isolated evolutionary lineages coming back into contact periodically. The low observed phylogenetic support values in the Midwest could be a result of such phenomena, though such hypotheses require proper statistical and population genomic analyses (see Figure 3; e.g., TreeMix, Pickrell & Pritchard, 2012). This is further reinforced by the differences between our phylogenetic LCC ( Figure 1d) and our genetic cluster LCC (Figure 1c). Most notable is a lack of connection between Midwestern US populations on either side of the Mississippi River in the phylogenetic analyses, whereas populations are connected in the cluster LCC.
Our study encompasses a wide geographic area, and our data reflects both the implications of factors more predominant at the population level (e.g., breeding system) and factors affecting landscape-scale patterns such as geographic distances (Husband & Barrett, 1996;Slatkin, 1987). For these reasons, future work will explore how current landscape genetic patterns may have been influenced by historic climatic patterns that may limit or facilitate population connectivity. For example, research will incorporate paleoclimate data with time-calibrated phylogenies to estimate temporally relevant connections (for example see Guillory & Brown, 2021;French & Brown, 2022 in preparation). In addition, ongoing work aims to quantify how the breeding system explicitly influences patterns of demography and fitness among populations.
Overall, our analyses of T. perfoliata illustrate the influence of the breeding system, geography, and the environment in shaping population genetic patterns in this widespread, mixed mating wildflower.

AUTH O R CO NTR I B UTI O N S
Morgan Tackett: Conceptualization (equal); data curation ( . We are also grateful for reviewer comments which greatly improve the manuscript.

CO N FLI C T O F I NTE R E S T
The authors claim no conflict of interest

FU N D I N G I N FO R M ATI O N
This work was supported by startup funds from Southern Illinois University, Carbondale to J. Weber, and additional research and startup funds from Southeast Missouri State University to J. Weber and R. Ruggiero.

DATA AVA I L A B I L I T Y S TAT E M E N T
Genomic data and breeding system data generated for this project are available via Dryad (https://doi.org/10.5061/dryad.sf7m0cg98 and https://doi.org/10.5061/dryad.wh70r xwr9).